14 research outputs found

    A BERT-based dual embedding model for Chinese idiom prediction

    Get PDF
    Chinese idioms are special fixed phrases usually derived from ancient stories, whose meanings are oftentimes highly idiomatic and non-compositional. The Chinese idiom prediction task is to select the correct idiom from a set of candidate idioms given a context with a blank. We propose a BERT-based dual embedding model to encode the contextual words as well as to learn dual embeddings of the idioms. Specifically, we first match the embedding of each candidate idiom with the hidden representation corresponding to the blank in the context. We then match the embedding of each candidate idiom with the hidden representations of all the tokens in the context thorough context pooling. We further propose to use two separate idiom embeddings for the two kinds of matching. Experiments on a recently released Chinese idiom cloze test dataset show that our proposed method performs better than the existing state of the art. Ablation experiments also show that both context pooling and dual embedding contribute to the improvement of performance.Comment: COLING 202

    Efficient organic solar cells enabled by simple non-fused electron donors with low synthetic complexity

    Get PDF
    Abstract Fused‐ring electron donors boost the efficiency of organic solar cells (OSCs), but they suffer from high cost and low yield for their large synthetic complexity (SC > 30%). Herein, the authors develop a series of simple non‐fused‐ring electron donors, PF1 and PF2, which alternately consist of furan‐3‐carboxylate and 2,2′‐bithiophene. Note that PF1 and PF2 present very small SC of 9.7% for their inexpensive raw materials, facile synthesis, and high synthetic yield. Compared to their all‐thiophene‐backbone counterpart PT‐E, two new polymers feature larger conjugated plane, resulting in higher hole mobility for them, especially a value up to ≈10 −4 cm 2 V −1 ·s for PF2 with longer alkyl side chain. Meanwhile, PF1 and PF2 exhibit larger dielectric constant and deeper electronic energy level versus PT‐E. Benefiting from the better physicochemical properties, the efficiencies of PF1‐ and PF2‐based devices are improved by ≈16.7% and ≈71.3% relative to that PT‐E‐based devices, respectively. Furthermore, the optimized PF2‐based devices with introducing PC 71 BM as the third component deliver a higher efficiency of 12.40%. The work not only indicates that furan‐3‐carboxylate is a simple yet efficient building block for constructing non‐fused‐ring polymers but also provides a promising electron donor PF2 for the low‐cost production of OSCs.A simple structure non‐fused‐ring electron donor PF2 alternately consisting of furan‐3‐carboxylate and 2,2′‐bithiophene presents very small synthetic complexity of 9.7% as well as low material cost of ≈19.0 $ g −1 . More importantly, PF2 delivers a high efficiency of 12.4% coupled with strong operational stability. imag

    HiJoNLP at SemEval-2022 Task 2: Detecting Idiomaticity of Multiword Expressions using Multilingual Pretrained Language Models

    Full text link
    This paper describes an approach to detect idiomaticity only from the contextualized representation of a MWE over multilingual pretrained language models. Our experiments find that larger models are usually more effective in idiomaticity detection. However, using a higher layer of the model may not guarantee a better performance. In multilingual scenarios, the convergence of different languages are not consistent and rich-resource languages have big advantages over other languages

    Learning and evaluating Chinese idiom embeddings

    No full text

    Exploring and adapting Chinese GPT to pinyin input method

    No full text
    While GPT has become the de-facto method for text generation tasks, its application to pinyin input method remains unexplored. In this work, we make the first exploration to leverage Chinese GPT for pinyin input method. We find that a frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect pinyin, which links to even larger number of Chinese characters. We mitigate this issue with two strategies, including enriching the context with pinyin and optimizing the training process to help distinguish homophones. To further facilitate the evaluation of pinyin input method, we create a dataset consisting of 270K instances from 15 domains. Results show that our approach improves performance on abbreviated pinyin across all domains. Model analysis demonstrates that both strategies contribute to the performance boost.Comment: To appear in ACL 202
    corecore